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Abstract 

We present two new models and their exact analysis for the problem of two processors running the 
Time Warp distributed simulation protocol. Our first model addresses the queueing of messages at each 
processor while the second model adds costs for rollback and state saving. Both models provide insight into 
the operation of Creerimning systems synchronized by rollback. 


Keywords: Discrete Event Simulation, Time Warp, Parallel Processing, Distributed Processing, Simula¬ 
tion, Optimistic Simulation, Rollback, Speedup, Queueing, Performance Analysis, Markov Chain. 


1 Introduction 

The systems which we are able to create become larger and more complex every day. We have moved beyond 
a point where one is able to predict the performance of a large system, be it a complex computer network 
or a super-sonic airplane, by purely analytical means. Therefore, it has become necessary to simulate the 
operation of proposed systems in order to better understand their behavior before huge investments are made 
in their implementation. Additionally, simulation is a useful tool to examine events unlikely to occur in the 
“real world”, such as a nuclear attack. As the size of these simulations increase they demand more computing 
time. Naturally then, one would like to utilize the recent advances in parallel computing technology to speed 
up the execution of simulations. Unfortunately, it is a non-trivial task to efficiently implement a parallel 
simulation system, though several techniques have been developed to do so. This paper presents analytical 
models of the performance of one distributed simulation algorithm. Time Warp (TW) fJefSSj. 

1.1 Previous Work 

Our research focuses on the analysis of the average case behavior of Time Warp when executing on exactly 
two processcH^. In our own previous work (FK91] [KF92) lKle89) we introduced a new model for the analysis 
of two-processor Time Warp. That model did not address message queueing nor did it associate a cost 

’This work wac supported by the Defense Advanced R< Projects Agency under Contract MDA 903-87-C0663, Parallel Systems 
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with rollback. Messages were only used for synchronization. This paper examines message queueing in our 
first model (something which has not been addressed in any model) and rollback and state saving costs 
in another model. These costs have not been adequately addressed in the previous work on two-processor 
models. Lavenberg et al. [LMS83| and Mitra and Mitrani [MM84| have examined models similar to ours, 
although messages were only used for synchronization in both those models. Lavenberg et al. derived an 
approximation for speedup of two processors over one processor. Mitra and Mitrani, using a discrete time, 
continuous state model, solved (as we do) for the distribution of the separation in virtual time between the 
two processes. Mitra and Mitrani do introduce the concept of a cost for rollback and optinrize the system 
based on it. Their technique was to calculate the average forward progress of the system per unit of real time 
(D), the average distance rolled back per unit of real time (R), create an objective function J — D — cR, then 
optimize the system with respect to J. Unfortunately this is somewhat artificial. The rollback cost should 
be an integral part of the model itself. When a process rolls back, it should be forced to pay a time penalty 
for rollback. A second criticism is that the objective function utilizes a rollback cost that is proportional to 
the distance rolled back. We believe that the cost is, at most, proportional t'> the log of the distance rolled 
back and is probably best approximated by a constant time delay regardle.v-. of the distance rolled back. 
Additionally, Mitra and Mitrani do go on to show how to allow for a different distribution for the size of the 
advance in virtual time depending on whether there has been a rollback or not. We discussed in more detail 
the relationship of the work of Lavenberg et al. and Mitra and Mitrani to our work in |FK91j and [KF92]. 

Lin and Lazowska ILLOOa) have examined Time Warp and conservative methods by appealing to critical 
path analysis. Also in (LLQOb) they create a model to reduce the state saving overhead in Time Warp. 
Though their work provides important insights, it generates different types of results than ours. Madis- 
etti (Mad89| (MWM90] provides bounds on the performance of a two processor system where the processors 
have different speeds of processing and move at constant rates, though again, messages are only used for 
synchronization. Madisetti extends his model to multiple processors, something we do not address in this 
work. Recently Nicol (NicQl] [NicOO] has attacked the problem of understanding the behavior of massively 
parallel simulations, both conservative and optimistic. 

1.2 Parallel Discrete Event Simulation 

Parallel Discrete Event Simulation (PDES) is generally accomplished by partitioning the simulation into 
logical processes (LP) which simulate some physical process in the system. Each LP maintains an independent 
local clock indicating how far forward in simulation time it has progressed. Processes interact by sending and 
receiving timestamped messages. Each process operates autonomously by receiving messages, performing 
internal computation and sending messages. A process will terminate once its local clock, the time of 
receipt of the message currently being processed, has reached some user specified value. Certain simulations 
only allow the LP to perfwm operations in response to messages (the messages carry the work), while other 
simulations allow each LP to perform internal computations regardless of whether any messages have arrived. 
For example, an LP which is simulating a single server queue only performs an operation in response to the 
arrival of a message (customer). On the other hand, an LP which simulates a customer arrival process 
operates without receiving any messages at all. Nicol |Nic91] discusses these two types of logical processes 
in more detail. 

Each LP could be placed on its own processor, and one might hope that we could then gain speedup 
proportional to the number of processors used. Unfortunately, this is often not the case as the system 
being simulated may have only limited parallelism |Wag89]. Also, the PDES algorithms themselves limit 
parallelism in their attempt to prevent the simulation from deadlocking and to ensure correctness. Several 
competing techniques have been developed to address deadlocking and correctness [Mis86] [PWM79]. The 
algorithm of interest for this paper is Time Warp [Jef85j an asynchronous approach which uses a rollback 
mechanism invoked only when needed for synchronization. The essential problem to address when designing 
an algorithm for distributed simulation is to maintain causality between events. In the physical system. 
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event A might have a direct causal effect on event B. When these two events are executed on two separate 
processors, it is non-trivial to efficiently make sure that event A actually occurs before B in real time. 
Time Warp maintains this causality by restoring a previous state and re-executing any operations it finds 
to have violated causality. The next section describes the algorithm in more detail. 

1.3 Time Warp 

The basic idea behind Time Warp is to allow each LP to advance forward as fast as it can without regard 
to the op>eration of the other LPs in the system. A TW process will choose the message with the minimum 
timestamp in its input queue; set its local clock to the time on that message; process the message; then find 
the next smallest message in the queue, etc. It is possible that a “straggler” message could arrive with a 
timestamp less than the local clock time of the LP. When this happens, the process is forced to “roll back" 
to a time before the timestamp of the arriving message This is able to be accomplished because the system 
pieriodically saves the state of the LP. Any effects of having advanced too far (i.e. erroneous messages) are 
canceled through an elegant technique called anti-messages (Jef85j. Any possible gain from the aggressive 
behavior of the Time Warp mechanism does not come without a cost. One of these costs is the overhead 
associated with the aforementioned state saving. There are two pierformance tradeoffs to keep in mind when 
choosing the frequency of state saving. If we save state very often, we pay a large time penalty in real time 
for all the data saving operations. If we choose to save state less often, we run the risk of having to roll 
back much further into the simulation time past than the time of the message causing the rollback, thus 
paying the time cost of re-executing correct events. Lin and Lazowska [LLQOb] address exactly this issue and 
find an optimum state saving interval based on certain assumptions about the arrival of messages and state 
saving costs etc. We don’t examine this tradeoff in our work. Rather, we force each processor to save state 
after the execution of every event so as to keep the model tractable. The other overhead of state saving is 
the space required to save the history of the LP. Fortunately, we do not need to keep all state information 
back to the beginning of the run. A concept called Global Virtual Time (GVT) [Jef85] allows the system to 
periodically throw away obsolete information. GVT is defined as the minimum of all the local LP clocks and 
the timestamps of all messages in transit. Since nothing in the system has a timestamp less than GVT, no 
process could ever be forced to roll back to a time prior to GVT. Obviously GVT is a very difficult measure 
to obtain, since we cannot take a “global” snapshot of this distributed system [LamTS]. Algorithms have 
been developed to calculate a lower bound on GVT [BelQO] which can be used as an estimate to free up 
memory space. 


2 Message Queueing Model 

We now introduce our model for two processor TW which allows messages which arrive in the virtual time 
future of a process to be queued. Additionally, the messages carry work for the receiving processor. 

2.1 A Model for Two Time Warp Processes 

Assume we have a job which is partitioned into two processes, each of which is executed on a separate 
processor. A process at virtual time v operates by first executing any message in its input queue with 
timestamp v and then executing any locally scheduled work. Once completing its local work at virtual time 
V, a process advances its clock one unit and will then send a message to the other process with probability 
qi. A process places its current virtual time on any message it sends. We will restrict the virtual times in 
our system to have integer values (i.e. 0,1,2,...). A process will schedule an event for itself at every point 
in virtual time. This means that processes will have their own work to do at every point in virtual time, and 
occasionally will have work sent to them from the other process. If a message arrives with a timestamp v 
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equal to or smaller than the local clock time of the receiving processor, that processor is forced to rollback 
(discarding any work performed at a virtual time greater than or equal to v), execute the arriving message, 
then proceed forward again from virtual time v. We show the execution sequence for each LP in Figure 1. 
Let w be the local clock time kept by the LP and let be the timestamp on any arriving massage. 


1 Set local clock (v) to 0. 

2 Execute local events for v=0. 

3 With probability q(i), send message stamped with 1. 

REPEAT 

4 Advance local clock to v=v+l. 

5 Process message in queue with timestamp = v (if it exists). 

6 Execute local events for time v. 

7 With probability q(i), send message stamped with v-t-1. 
UNTIL (V >= MAXTIME) 

* If a message arrives at any time with a timestamp (tm <= v); 

• set local clock to tm 
- goto line S and continue from there 


Figure 1: Code executed by each processor. 

More formally, we define two processes each executing on a separate processor. As these processes are 
executed, we consider that they visit the integers on the x-axis each beginning at x = 0 at time t = 0. 
To process a queued message, each processor takes an exponentially distributed amount of time with mean 
l/f3i {i = 1,2). To process its locally generated work takes an exponentially distributed amount of time with 
mean l/Aj (t = 1,2). We assume that = /A, where 0 < / < oo. After process i makes an advance along 
the axis, it will send a message to the other process with probability gt (» = 1,2). This message carries the 
a timestamp which is the time of the sender after making the advance. Upon receiving a message from the 
other (sending) process, this (receiving) process will do the following; 

1: If its position along the x-axis is behind the sending process, it queues the mess^e. 

2: If its position is equal to or ahead of the sending process, it will immediately move back (i.e., “rollback”) 
along the x-axis to the current position of the sending process and begin to process that message. All 
work completed at virtual times greater than or equal to its current {xisition is discarded and must be 
re-executed. 

Let F(t)= the position of the First process (process one) at time t and let 5(t)= the position of the 
Second process (process two) at time t. Further, let 

D(t) = F(t) - 5(f). 

D{t) = 0 whenever Case 2 occurs (i.e., a rollback). We are interested in studying the Markov process D{t). 
FVom our assumptions that F(0) = 5(0) = 0, we have D(0) = 0. Clearly, D{t) can take on any integer value 
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D(t,) = F(t,) - S (t,) D(t2) = F(t2) - S(t2) 

(D(t,)>0) (D(l2)<0) 


Figure 2; States of two processors at times tj and tj. 

(i.e., it certainly can go negative, see Figure 2 which shows the position of two processors at times (j and 
fa). We will solve for 

Um P[D{t) = fc] — oo < k < oo 

namely, the equilibrium probability for the Markov chain £>(f). In order to find the solution, we split the 
chain into five regions. 

P* = lim P[D{t) = k and Processor 2 is not processing a msg| ^ > 1 

t—*oo 

Qk = lim P[D(t) = -k and Processor 1 is not processing a msg] k >1 

*00 

S* = lim P(D(f) = k and Processor 2 is processing a msg) k >0 

t—*oo 

Rk = lim P[D{t) ~ —k and Processor 1 is processing a msg] k >0 

t—*oo 

No = lim P[D(t) = 0 and neither is processing a msg] 

Bo = lim P[D{t) = 0 and both are processing a msg] 

(—400 

Using our solution, we will go on to solve for some interesting performance measures including the average 
rate of progress of the two-processor system. 

There are some implicit assumptions in our description. Our model assumes that states are stored after 
every event, otherwise a rollback would not necessarily send the processor back to the time of the tardy 
message; rather it might have to rollback to a much earlier time, namely, that of the last saved state. When 
process i causes the other process to rollback, process »immediately discards any messages it has queued in 
its future. This is as if the rolled back processor is able to transmit anti-mess«^es instantaneously. This is 
not an unrealistic assumption in a shared-memory environment [Fuj89]. Another implicit assumption is that 
each process always schedules events for itself. We assume that communication between processors incurs 
no delay from transmission to reception. Finally, the interaction between the processes is probabilistic. 
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Figure 3; State Diagram for the Message Queueing Model. 



2.2 Analysis of the Message Queueing Model 

In this section we provide the exact solution for the continuous time, discrete state model introduced in 
Section 2.1. First, we provide some definitions. 


Ai = 
0i = 
a = 

a = 

A = 
B = 
ft = 
ft = 


Rate at which Processor » processes local events 
fXi = Rate at which Processor » processes messages 
Ai 

Ai + A2 
Aj , 

r-^ = 1-0 

Ai + Aj 
a + af 
a + af 

P[ processor sends a message after advancing] 

1 -qi 


A state diagram for this system is shown in Figure 3. 
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The balance equations for our system are: 

Pk = aPfc-i + Tq^qi P/t+i + 5/5* * > 2 (1) 

Pi = oA/o + Pj + 5/5i (2) 

Qk = 5<5*-i + dqiQiQk+i -h afPk k >2 (3) 

Qi = aA/o + o-qiq-iQi + o/Pi (4) 

No = +59291 Pi+ a/Po+ 5/5o (5) 

oo oo 

/Bo = 59192 ^ Pi+ 09192 Qi (6) 

i=l t=l 

ASk ~ oSt-i+59291 P*+i fc>0 (7) 

OO 

yl5o = aq2qiPi + aqiqi'^Qi + afBn (8) 

i=l 

Bfl* = 5B*_i + a5i92Q*+i A: > 0 (9) 

OO 

SPo = «?i^Qi + «?2?1 + a/Bo (10) 

i=l 

OO OO OO OO 

i=l i=l 1=0 i=0 


This system will have a steady-state solution if Ai > 0, 9 i > 0 and / > 0. These are fairly straightforward 
restrictions. The Ai must be greater than 0 or the system makes no progress at all. The qi must be greater 
than zero so that there is some probability that a processor will be rolled back once it gets ahead. Finally, 
/ must be greater than zero so that when a message is being processed the system will eventually complete 
the op)eration. 

We define the following z-transforms (note the different ranges on k): 

OO OO 

fc=l fc=l 

00 oo 

5(r) = 535*z* R{z) = J2Rkz'‘. 

k=0 *=0 

Using the above equations we can solve for P{z), Q(*), 5(2:) and /?(«) by multiplying the appropriate 
equation by 2 * and summing over the applicable range of k. To simplify the expressions we define the 
following: 

Fs = 592P(1) + (1 -592)Q(1) 

Fr = aqiQ(\) + (\-aqi)P{\). 






Solving for P{z) in terms of S(2) we get 


P(2) = 

and for 5(i) in terms of P(z 


z { - (AS{ 2 )af) + Fsaafqi + Pia(A - aqi)q‘i— ANpaz) 
A (aq^q-i — z + az^) 

^ ^ (P(«)^2 + Psai) 


z{A ~ az) 

Solving them simultaneously we arrive at 

•2 (— + Pi^(^ — «9i)92 ~ “■*) " AN 00 .Z {A — az)) 


P(z) 


S(z) = - 


yl(- (a{j4 - 0^1)92) + (-4 + «“9 i 92) z - + A) az^ + a^z^) 

Pi^qi {A - 091)92^ “ ANoaaqiqiZ + Fsaqi {a{A — aqi)q2 — Az Aaz"^) 


A( - (a(j 4 - 091)^2) + (-^ + «« 9 i 92 )^ - (1 + Ajaz* + 0*2®) 
The numerator polynomial, N(z), for P(z) is simply 

N{z) = - (2 (- [Fsa^'dfqxz) + Pia(A - aq\)q 2 {A - az) - ANaaz(A - az))) 
Moreover, the denominator polynomial, D(z), for P(z) may be factored as follows; 

D{z) Aa^ {z ~ ri) (z - r^) (z - rj). 
where ri, r2 and rs are the roots of the cubic polynomial in D(z). 


l + A- 2y/r^^^~ATW^^^3^^cos{^^) 

= -3^1- 


ri = 




1 + >1 — 2y/\ —A+A^ — 3aaqiq2COs{^) 


3a 


I 4“ A — 2 y^\ — A 4" A? — 300^1^2 cos( ) 


3a 


Symmetric roots (sj, S2, S3) for the denominator of ^(2) can be written down directly 


1 4" 5 — 2y^1 — B F — 3aa<7i52 ) 

sj _ _ 


S2 = 


S3 = 


1 4- B — 2y/l — B + — 3aaqiq2Cos(^) 


35 


1 4- S - 2v^r^^STS^^^^3^5^co8(^I|f.) 


35 


where 


6r — arccos 


( 


- ((>1 - 2) (1 4- .4) {2A - 1)) 4- 9a5?2 (-3>1 4- 3o<fl 4- (1 4- 

2(1 — A + A^ — 3a59|92) ^ / 

9s = ^^^aJ -iiB-2){l + B){2B-l))+9aaqi{-3B + 2:aqi + il + B)qi) \ 


2(1 — B + B^ — 30091^2)^ 




( 12 ) 


(13) 
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See Appendix A for a derivation of the roots. It can be shown [Fel91) that n, and rj are real and that 
I rj 1< 1 while | n |, | rj |> 1. Since P(i) is the z-transform of a probability distribution, it must be analytic 
in the range | 2 |< 1, and we know that N( 2 ) must go to zero at z = rj. We can use this fact to solve for 
Pi, yielding 

p _ arj (Fsaa/qi 4- AiVb(A - ora)) 

‘ 5(A-a(7,)q2(A-ors) 

Substituting this value back into JV(z) we may write 


N(z) = 


Aaz (z - r 2 ) {Fsoafqi + No{A - ar^) {A — az)) 


and thus 


P{z) 


A - ar2 

z (Fsaajqi No(A- ar^) (A - 02 )) 


o(A-ar2)(ri -2)(r3-2) 
A similar procedure can be carried out on S(z) resulting in 


(14) 


5(2) 


qi {Npaq^ + Fsjl -arj- az)) 

o(n - «)(»‘3 - 2) 


(15) 


Moreover, Q{z) and R{z) are symmetric in {a, a), (< 71 , 92 ) and (I3i,j3t) to P(z) and S(z) so we can write them 
down directly. 


2 {Fnaafqi + No(B - as2) (B - 02 )) 
a(S -os 2 )(«i - 2 )(s 3 - 2 ) 

92 {Nodqi + Fk {I -as 2 - az)) 
a{si - 2 ) (S 3 - 2 ) 


(16) 

(17) 


Recalling that Fs and Fn are functions of both P(l) and Q(l), we see that P{z) and Q{z) are functions 
of P(l), Q(l) and Nq. We solve for P(l) and <5(1) by solving Equations 14 and 16 simultaneously with 
2 = 1. 

Pil) = CpNo Q(1) = CqNo 

where 


and 


Cp 


1 — Cpp — + C'ppCg, 

^<7"o ~ ^Pnp^'qp 

1 — ^pp — ^P^^VP — ^PP^Rt 


-'pnn 


■'PP 


■'PI 


■'qn, 


'IP 


'VI 


0/ 

o(r, - l)(r 3 - 1) 

_ 5^9192 _ 

(n -i)(>i-o»' 2 )(»' 3 - 1 ) 

afqx (1 - 592 ) 

(n - 1)(A -or2)(r3 - 1) 

0/ 

o(Sl - 1)(»3 - 1) 
q/(i -091)92 
(si - 1 )(B -as 2 )(s 3 - 1 ) 

_ a^/9192 _ 

(si - l)(fi -as 2 )(s 3 - !)■ 
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Noting that P(l) + Q(l) + 5(1) + /?(!' r No + So = 1 we solve for Nq. 

(Cgo + Cpa)</i92 

/ 

CfsCMfqi ^af{A-ar-i) qi + Cfs (« - arj)) 

^ a(ri - l)(i4 - ar 2 )(r 3 - 1) o(ri - 1) (rs - 1) 

Cr^d&fqi + af (B -Hs^) Q 2 (oigi + Cf^ ( 0 - 082 )) ' 

a(si - 1)(S-aS2)(«3 - 1) 5(si - 1)(S3 - 1) 

Finally, by inverting the transforms we find the probability of being in any state (other than Ao). 


Pk = Nij 


M y 

Kri) 

k > 1 

(19) 

0 

11 


(i)‘ 

k > 1 

(20) 

II 



IV 

0 

(21) 

II 

(i)'-i 

a)‘ 

it > 0 

(22) 

„ No(C,30 + C/»o)9ig2 

00 = j 


(23) 


where 

No (CfsOMfqi -f (/t - ari){A - or 2 )) 
o(i4-or2)(o-r,) 

No (CFsOafqi + (-4 - or^) {A - ars)) 
o(v4 -or2)(ri -rs) 

No(Cf^oS/q2-f (S-osi)(S-o32)) 
a(B -3a2)(«3 - «i) 

No (.Cf^dafqi + {B- 033 ) (S - 033 )) 
3 (B — 332 ) {~ S 3 ) 

Nogi (392 + - ari - or2)) 

ari (rs -r,) 

Nb?i (5^2 -HCy, (1 -or 2 - arj)) 

“(n -r3)r3 

Noga ( 0^1 + Cf, (1 - 531 - 032)) 

3si (33 — si) 

Noqa (q^i + C7f« (1 - 332 - 333 )) 

5(si - 33)33 

Cp3g2 + OjCl - 3 ^ 2 ) 

Cqaqx + Cp(I — ogi) 


Kx = 

Ki = 
N 3 = 
N 4 = 
Ns = 
Ko = 
Ky = 
Ng = 

Cps = 
= 


(18) 
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This completes our calculation of the explicit expressions for the equilibrium state probabilities of our chain. 

2.3 Performance Measures 

Using the solution to the Markov chain which was calculated above, we may solve for any performance 
measure of interest, fn the following sections we examine a few important ones. 

2.3.1 State Buffer Use 

When a processor completes its local processing it advances its clock by one time unit. Therefore, if a 
processor is ahead by k units of virtual time (fc units of distance on the axis), then it will need to have saved 
k states. The expected number of buffers (B,) needed to save state at each processor can be found from 

OO 

Si = X^«(P.+ 5.) 

t=i 

— (^1 + Ks)r\ (Kj + K 6 )ri 

OO 

1=1 

_ (/C'a + ^7)^1 (/C4 -i- Ki)a 3 

(«I- 1 )* (^ 3 - 1 )* 

More interestingly, we find that the probability that a fixed size buffer of size b > 1 overflows at processor i 
(©i,6) is 

00 

0i.» = 2 ] 

t=^+i 

OO 6 

i=0 «=0 

{Ki + Kj) {K2 + K6) 

nVi - 1 ) »- 3 *(r 3 -l) 

OO 

Q2.6 = (Q< + P) 

t=6+l 

OO 6 

= ^(Qi I-^) - + ^) 

»=0 i=0 

(Ks-f-Kj) (K4-hKa) 

Si*(si- 1 ) 33^(53-1) 

2.3.2 Message Queue Distribution 

Messages which arrive in the virtual time future are queued until the processor completes all work with a 
virtual time less than the arriving message. We define the size of the message queue as the number of messages 
queued in the virtual time future of the processor, plus any message which is currently being processed. The 
distribution of message queue length at each processor is found by summing over the appropriate ranges of 
the state probabilities. 



(24) 


(25) 


II 





"ll,k 


P[fc msgs queued at Processor 1] 

£<3' (i)*v-‘ + E K,(t i ,)®‘"V 


- .-*+1 


k>2 


/f392*«l , ^792* *Si ^"492**3 , ^^892* ‘s3 


-= \*+l 


(si - 92) 


+ 


(«1-92)‘ («3-92) 


t: A*+1 


+ 


■= \k 


(S3 -92) 


mil = 


y^. Q«*9292* ^ + 53 ^92* + ®0 

t=l 

KiOiSi 


«=o 


K 7 S 1 


(si - 92)^ 


+ 


^”492 S 3 


/^8S3 


Si - 92 (S3 - 92)^ ~ 92 


^ + So 


mi,o = P(l) + S(l) + No + X^(3i92‘ 


i=l 


P(l) + S(l) + TVo + 


^2,k 

^2,k 


Si - 92 S3 - 92 
P[A: msgs queued at Processor 2j 


= E 5.(^1 j)9i‘-‘9r‘^‘ *>2 

^■ 191^1 /f 59 l''“‘n . KiOi'^rs Ktqi^'^rs 


(n- 9 ir 


(n- 9 i) (^•s-?!) 


(>•3 -9i) 


7712,1 = E ^'*919i’'^ + E ■^•9i‘ + Po 


«=i 


t=0 


P^i9in , /^sT'i K^qirz Kers 

■*-+ Po 


7712,0 = 


(n-9i) '•1-91 (r3-9i) '■3-91 

00 

g(i) + fl(i) + A'o + EW 


i=l 


g(l) + R(l) + No + 


n - 9i r3 - 9i 

The mean number of message buffers needed at each processor is 

00 


?nr = E 


i=0 


Si + Kj (ai — 93 )) ^ £3^4% + /fg (s 3 ^ 92)) 


(si-1)^ 


(s3-ir 


+ Po 


7712 = E 

i=0 

_ 71 {Kiqi + Kj (ri -91)) ^ 73 (K 2 qi + Kj [rs - 91)) 


(71 - ly 


(73 - 1)^ 


+ Bo- 
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2.3.3 Normalized Rate of Progress 

From the complete solution of the Markov chain we calculate the average rate of progress of the two processor 
system. We define as the average rate of progress in virtual time of the two-processor system. This value 
is simply the average “unfettered” rate of progress of the two processors minus the average rollback rate. 


72 = 


( oo oo \ 

Qk+No+ ^ Pk I 

*=1 *=i / 

oo oo oo oo 

-1-Ai ^ 5* -f- Aj ^ - Aa® ^ P*(fc - 1) - X\q\ ^ Qk(k - 1) 

*=1 t=i 


k=0 


k=0 


Kx Ki Ki \ 

- (i.+A,)(^—+ —+ No + —+ —j 

(»i -1)0" (oT^ ^ (jp^) 


(28) 


We can calculate a “normalized" rate of progress (F) by dividing the above equation by (Ai + Aa). We arrive 
at 


r = 


Kz Ka \ 
Si -1 S3 -1 y 


( Ki 

-^ ^ + No+ 

Vn - 1 ra - 1 

+“ + ;7rT; •-» VJTo 5^; 

f Kx , Ki \ f Ki , K, \ 

V(ri - 1)2 (rs - 1)2 j “i{s, - 1)2 + (S3 - 1)2; 


/ KiVx , Kpri ^ NjSx ^ Kos% 


(29) 


It is interesting to note that as f -* oo the message processing time approaches zero, therefore messages 
are only used for synchronization and our system reduces to our original model [Kle89]. In Figure 4 we show 
the value for F when a = 1/2 and ® ^ which we refer to as the Symmetric, Balanced case. The figure 

shows F versus q for various values of /. We see that for the best performance we want q to be small and 
/ —* oo. This is the case where there is little interaction between the processors and it takes zero time to 
process a message from the other processor. By setting / = 1 we can examine F versus q only. This plot is 
shown in Figure 5 compared to the average rate of progress for the same system where messages are only 
used for synchronization (/ = oo). We see that the system where messages carry work performs more poorly 
than where they are only used for synchronization. This is no surprize since there is more work to do. It is 
interesting to note that this system is not twice as bad as the synchronization-only system even at 9 = 1. In 
fact, at g = 1 we can verify the F result for / = I by realizing that each processor will always have a message 
to process. Therefore, the rate of progress at each step is governed by the maximum time it takes for the 
two processcnrs to each finish a message and local work. This is simply the expected value of the maximum 
of two 2-stage Erlangs at rate A whidi is equal to {j. Taking the reciprocal and dividing by A to find the 
rate, we get F = 4/11 which is the value plotted in Figure 5. 
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Figure 4: F versus / and q for the Symmetric, Balanced Case. 
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2.4 A Specific Example 

To better understand the above results we explicitly calculate values of our performance measures for a 
specific instance of the parameters of our system. The values chosen are given below. 

As = 9 
9 
20 

1 

1 

92=3 

Note that processor one will move slightly faster than processor two while the cost of processing a « 

the same as processing a locally generated event. Finally, processor one will »nd a message with probability 
1/2 while processor 2 will send a message with probability 1/3 after advancing. 


Ai = 11 
11 

““20 


/ = 


91 = 2 
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Figure 5: F versus q for the Symmetric, Balanced Case. 


2.4.1 State Probabilities and State Buffer Use 

The resulting equations for the probability of being in any state are 



No 


0.0781 



Bo 

as 

0.0423 


• 

Pk 


0.114 0.0359 

1.281* 2.086* 

k > 1 


Qk 


0.1385 0.0605 

1.702* 2.468* 

k > 1 


Sk 


0.0452 . 0.0175 
1.281* 2.086* 

*:>0 

• 

fl* 


0.0319 . 0.0203 
1.702* ' 2.468* 

fc>0. 


These probabilities are plotted in Figure 6. As you would expect, P* > Q* and S* > f?* since processor one 
is moving at a faster rate than processor two. The expected number of buffers needed to save state at each 
processor (St) is given by 



^ t{P + 5.) w 2.5489 

i=l 

cc 

'Bi = + 0.5429. 
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From the values for ©i ^ and © 2,6 

0.5663 0.0169 

1.281‘ ~ 2.086* 

0.2428 0.0273 

1.702* 2.468* 

we find that with probability > 0.99 processor one will not need more than seventeen buffers. A similar 
value can be found for processor two. 


©1.4 = 

© 2.6 = 


P[Processor 1 needs > 17 state buffers] w 0.00841 < 0.01 
PfProcessor 2 needs > 6 state buffers] « 0.00988 < 0.01 






















» 


» 



k 

Figure 7: Distribution of the number of messages queued at each processor. 


2.4.2 Message Queue Distribution and Buffer Use 
The distribution of messages at each processor is given below. 


^1,0 


0.7569 

mi.i 


0.1805 

«=fc-i 

00 , ; 

+ £5 

i^k 

miji 

a 



m2,o 


0.4074 

^2,1 


0.2441 

00 , i 

i=k-l 


fa 


*-i 2*-*+J 
3 


'0.0319 0.0203N r i \ 

,1.702’ 2.468V \k-l) 


^ 1*2‘-* /0.1385 0.0605\ /i\ 

^^33 V 1.702‘ 2.468 ‘) \k) 


k>2 


1.04517 0.0175 

1.28F 2.086' 




^2 Vl.28l‘ 2.086V W 


k>2 


The values of these functions are plotted in Figure 7. The mean number of message buffers needed at each 
processor is 


mi « 0.3346 
ffl2 w 1.5562. 
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As with the state buffers we can find the number of message buffers needed to store messages such tfiat the 
buffers will overflow with probability < 0.01. 

P(Processor I needs > 3 message buffers] w 0.0063 < 0,01 
PfProcessor 2 needs > 9 message buffers] » 0.0097 < 0.01 

Finally, the value for the normalized rate of progress is P « 0.5071. 

2.5 Summary 

We introduced and solved exactly a new model for two-processor Time Warp operation. The importance 
of our new model is that it explicitly accounts for the work that must be performed by each processor in 
response to the receipt of a message. Messages that arrive in the past cause rollbacks, while messages that 
arrive in the future are queued until the LP moves forward in simulation time. In all cases the messages 
create work for the LP. 

With the complete Markov chain solution we calculated the normalized rate of progress of the two 
processors, and the distribution of the number of messages queued at each processor. Further, we found the 
expected number of buffers needed to save state and/or messages at eadi processor. Since we have the exact 
solution to the complete Markov chain we can calculate nearly any parameter which might be of interest. 

3 A Model for Rollback and State Saving Costs 

If the costs for rollback and/or state saving are high, TW may perform poorly. The following sections 
examine the two-processor system when we account for rollbadc and state saving costs. 

3.1 The Model 

We use a model similar to the one introduced in Section 2.1, a continuous time, discrete state model where 
each processor maiv s only single step state advances whenever it advances. Right after a processor is forced 
to rollback, it pays a cost for restoring state by making the expected rate of forward progress smaller than 
normal for one event. When processing the “rollback event” each processor moves at a rate A = fK where 
0 < / < 1. Once this event is completed, the processor moves again at its normal rate of Aj. Note that when 
/ = 1 there is no additional cost for rollback and this model reduces to the one in [Kle89). We add a cost 
for state saving in Section 3.3.2. 

To solve the system we separate the Markov chain into five different regions again. 

P* = lim PlD{t) = k and Processor 2 is not in a rollback state) k> I 

Qk = lim P[D{t) = —k and Processor 1 is not in a rollback state] ^ > 1 

t —*00 

Sk = lim P\D{t) = k and Processor 2 is in a rollback state] k>0 

t -^00 

Rk = Um P[D{t) = —k and Processor 1 is in a rollback state] fe > 0 

Po = lim P[I>(t) = 0 and neither is in a rollback state] 

*00 
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Figure 8: State Diagram for the Rollback Cost Model. 

3.2 Analysis of the Cost Model 

In this section we find the exact solution for the model which addresses rollback and state saving costs. The 
parameters of this system are 

Aj = Rate at which Processor t processes events 

0i = /Ai = Rate at which Processor » processes after a rollback 

_ 

Ai + Aj 

- -^2 1 

o = -— = 1 — a 

Ai + A 2 

A = a+ af 
B ■— a + af 

qi = P( processor sends a message after advancing] 

9i = 1 - ft 

A state diagram is shown in Figure 8. Note that the So and Ro states were duplicated to keep the figure from 
being too cluttered with transition arcs. As with the previous model, this system will have an equilibrium 
solution when Aj > 0, gj > 0 and / > 0. 
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The balance equations for this new system are 


(Ai + lh)Sk 
(M + 02}So 

(Aj + 0i)Rk 

(Aj + 0\)fto 

(Ai + A 2 )/’* 
(Aj + AjjPi 
(Ai + A2 )Po 
(A a + Ai)Qjt 
(Aa + Ai)Qi 


\Sk-i k > 1 

00 00 

t=l i=l 

■^aP/k-i A: > 1 

00 00 

i=l i=l 

-AiPfc-1 + Aa9aP*+i + IhQiSki-i k >2 

AiPo + A 252 P 2 + + A-Po 

■^i9iQi + ^iQiPx + AfliPi + AflaA 
^iQk-i + M9iQi+i + AfliPk+i A: > 2 

AaPo + -^WiQa + 019x^2 + AA) 


(30) 

(31) 

(32) 

(33) 

(34) 

(35) • 

(36) 

(37) 

(38) 


= Po + ^Px-h^Qx + ^Sx + E^- 


(39) 


i=l 


X=i 


i=X 


X=1 


As before we define the following z-transforms (note, S(z} and R(z) are defined from A: = 1 not k — 0 as 
in the previous model); 


P(z) = £p*z‘ Q{z) = X^Q»z* 

*=i *=i 

S(z) = f^SkZ* P(z) = f;P*z*. 


*=i 


*=j 


We proceed to find P(z), Q(z), S(z) and Jt(z) by multiplying the appropriate equation above by z* and 
summing over the valid range of k. This leads to 


P(z) = 
Q(z) = 


— (Aa(Po + Rpf)z^) — 0^2 {AS[z)f - APiz - Spa/z) 

A (oqa — •* + 02 *) 

- (gg (Po + 5o/) z^) - o?i (BP(z)/ - BQiz - Rpa/z) 

B {dqi — z + oz®) 


S(z) 

P(z) 


Sbaz 
A — oz 
PqSz 


B — az 


Substituting the value for 5(z) into the equation for P(z) we arrive at the following equation which 
defines P{z). 


p. , _ z (- {Soa^afg^z) + APioq^ (A - az) - Ao (Pp + Pp/) z (A - oz)) 

A (A — az) (oqa — z + az^) 


(40) 
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The denominator of P(z) can be factored into ^4(^4 - az)(z — ri )(2 - r 2 ) and the denominator of Q{z) into 
B(B -■5 z)(2 - si)[z - sj) where 


(«1.S2) 


1 ± y /1 — 4adqj 
2^ 

1 ± y /1 — 4aa9i 
25 


It is simple to show that ri and rj are real and that ri > 1 while 0 < rj < 1 [KF921. Since P(z) must be 
analytic in the region | ^ |< 1 the numerator of P(z} must go to zero when z = rj. Using this information 
we solve for Pi. 

p ^ ara {Soaafq^ + A(Po+ Rpf) {A - arj)) 

' i4a42 

We substitute this value back into the equation for P{z) and arrive at 


P{z) 

I 

Similarly for Q(z) we find 


z (Spoafg^ + (pQ + Ppf) (A - or;) (A - a^)) 
(A - ari) (n -z)(A - az) 


(41) 


_ z (Roaa/gi + (Pp + Sp/) (B - 0 ^ 2 ) (B - az)) 
(B-as 2 )(si -z)(B-az) 


(42) 


Our task now is to find the values for the unknown constants Po, So and Po- We can solve the equations 
for So (31) and Po (33) simultaneously to find 


, <7,(5292P(l) + BoQ(l)) 

oo = - rs -=- 

AB - aaqi<j2 

_ 97 {AaP{\) + a^qiQil)) 

I AB- aaqiqi 

The above values are substituted into the equations for P{z) and Q{z) and we find P(l) and Q(l) by solving 
Equations 41 and 42 simultaneously with z = 1 to arrive at 




where 


I 


P(l) = CpPo £?(!) = CqPo 


Cp 

Cg 


OppD + CpqCqp, — Cpjp^C^q 

I — Cpp — CpgC^ — + CppCqq 

^PPo^lP ^<IPo ~ 

1 — Cpp — Cp^Cgp — C*,, + CppC,pf 


I 
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and 


■'I’PO 


'pp 


'Pi 


'iPo 


Cqp 


'ii 


1 

(n -1) 

Q<?2 {Af A + aaq\qi - afAr’i) 
{AB - daqiqi) (n - 1) (yl - avi) 
{Afqj + Bq^- a/qiPj) 

(AB - ao9i®) {n - 1) (A - ari) 

1 

(^1-1) 

a^9i(Bfqi Aq^ -afqiSj) 
(AB -daqiq2)(sI - 1)(S -asa) 
aqi (B/B + daqiq 2 - afBs-j) 
(AB — aaqiqi) (si — 1) (B - as2)- 


Po IS derived from the fact that the probabilities must sum to 1. 

Finally, the equations for P(z), Q(z), S(z) and R(z) can be inverted to find the complete solution to the ( 

Markov chain. 


S* = 5o(^)‘ k>0 

g. = W) 0- (l)‘) 

Bk = Ro A: > 0 


So 

fio 


Cs,Po 

Cr^Po 

qi {CqaB + Cpa^qz) 
AB — aaqiqi 
{CpaA + Cqa^qi) qz 
AB — 009192 


Po 


(l + Cr^ + Cs, + 


+ 


a/ 


1+CsJ , 


1 + gfi./ 

(n -1) 


. (^Spogi 

(ri-l)(A- ar 2 ) 


C’fl,o9i 

(si - 1)(B-5 s2) 



(43) 

(44) 

(45) 

(46) 


( 


I 


(47) 
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3.3 Performance Measures 

3.3.1 State Buffer Use 

Using the state probabilities we find the average state buffer occupancy at processors one and two. 




'm 


Y,i{Pi+S,) 

i=\ 

ASpa (Pq + Rcf)ri Sgaafl^i ( 

a^P (r, - 1)^ (i4-ori)(yl - orj) V(ri - 1)* 


j=l 

Bfipa (Pp + Sp/) Si Rcoafqi / si 

a^P (si - 1)^ (B -asi){B -asi) V(si - 1)* 


Aa \ 


Ba \ 

a^pj 


( 48 ) 


(49) 


As with the previous model we also find 61 , 4 , the prc^ability that a fixed sized buffer of size b > 1 
overflows. 


01,6 = ^(Pi + Si) 


i=k+l 


Spa fa\>> (Pp + fip/) 


V 


(i) 


Spaa/^2 


^ (ri - l)ri‘ ^ (A - ari)(A - or*) 




02.6 = + 


•=6+l 


flpg / gy 

0 / \b) 


, (n+5o/) . 

* / n \ L » 


RodSfqi 


(si-l)si‘ (B-asi){B 


1 _ ( _ 1 (sn 

-0S2) \(si-l)s,» af\B) ) 


3.3.2 Speedup 

From the complete solution of the Markov drain we calculate the speedup S of the two processor TW system 
over an equivalent single processor. The speedup is simply the rate of the two processor system 62 divided 
by the rate of progress for a single processor system 61 . The rate of forward progress for one processor is 
defined simply as the average rate of progress of the two processes 

Ai + A 2 
<1 = -^ 

At this point we add an additional cost for state saving by allowing a single processor to move at a rate 
which is C times faster than the TW processors. Thus, state saving increases the average execution time of 
an event from l/Ai to C/Aj when running TW. The revised rate of progress for a single processor is 

^ C(A,+A2) 
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while the rate of progress for the two-processor TW system is found from the following equation. 


62 — (Ai + A 2 )Po + (-^i + + (A2-I-/?i)-/^o 

-f-{Ai -f- A2)P(1) -I- (A 2 -(- A|)Q(1) (A] + /%)S(1) + (A 2 + /?i)/?(l) 

00 CO 

-A2<fc ^ P*(fc - 1) - A.fli Y1 

*=l *=I 

00 00 

-/3292 5Z^*(^ - - A</i5Z 

*=1 *=i 


Taking the ratio S = 82 /Si (i.e., the speedup) we arrive at 


5 


__ f Po + Rpf Soaafqi ( 1_\N 

*** Mn - 1)* (■4-o»'i)M-or2) V{r, - 1)* a^Pj) 

( Pq + SqJ R^aafqi / I ^ \\\ 

‘^^\{si-lf^(B-asi){B-as2) V(si - 1)* a'/"/// 


(52) 


We note here that this measure S is different from the measure F used with the message queueing model 
in Section 2.3.3. In that model we were unable to calculate the average rate of progress for a single processor 
due to the effect of messages. Since messages now carry work, it would be unfair to compare the two processor 
TW system to a single processor system without messages. The TW system would be doing nwre work. On 
the other hand, it was non-trivial to attempt to account for this extra work caused by messages and add 
it to the single processor system. We finally settled on a measure which was a normalized rate of progress 
by dividing the rate of progress for two processors by (Ai + A 2 ). Fbr the rollback cost system the rate of 
progress on a single processor is well defined and therefore, we use a speedup measure 5. 

For the Symmetric, Balanced case where Ai = A 2 = A and qi = 92 = g we get the following equation for 
speedup. 


_ 4/(/-l-v/g) _ 

C7 ( 2/2 -H / (2-h/) yg (2 -/)/g-H 2 ( 1 -/) gi) 


(53) 


We show a plot of this function in Figure 9 for C = 1. 

Using this simple formula for speedup we find the values of /, g, and C which allow two processors 
running TW to progress faster than a single processor without TW. This is the region where S > 1. We 
solve Equation 53 for C when 5 > 1 resulting in the inequality 


_ 4/(/-i-Vg) _ 

+ /(2 + /)v/g + (2-/)/g + 2(l-/)g 

Therefore, we find that C must lie below the surface plotted in Figure 9 for 5 > 1 . It is clear for C > 2 that 
TW (XI two proces 8 (xs is always slower than using a single processor without TW. Further, since C must be 
greater than or equal to one (cost of state saving is > 0 ), there is a region in the g - / space where speedup 
is not possible. That is the shaded region shown in Figure 10. 

Since rollbacks can be costly {C > 1), there may be an advantage to slowing down or stopping the faster 
processor when it gets ahead so as to avoid rollbacks. Mitra and Mitrani [MM84], using their optimization 
function J = D — cR (see Section 1.1), find regions of the parameter space where the maximum of the 
function is found at the boundary where the processors have zero processing capacity (don’t perform the 
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i 
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Figure 9: Speedup versus q and / for the Symmetric, Balanced Case when C = 1. 


task at all). Essentially, they found that Time Warp could perform poorly if the cost for rollback was high. 
We, on the other hand, will look to improve TW by slowing down or stopping the processor which gets too 
far ahead. Looking again at the Symmetric, Balanced case where Ai = A 2 = A and 9 i = qi = g we find that 
region of g — / space where it is better for a processor to stop processing when it gets exactly one step ahead. 
The state diagram for such a system is shown in Figure 11. Each proces.'wr will stop when it gets exactly 
one step ahead of the other processor. There will be no rollbacks and therefore no need for state saving. 
When Ai = A 2 = A we find that =Qi = Pq = 1/3, and that speedup over the equivalent single processor 
system is 4/3. Therefore, we can always get a speedup of 4/3 regardless of the values of /, g and C. For 
general values of Ai and A 2 the speedup is 


4(1 -o)a 
1 —0 + 0^ 

which has its maximum of 4/3 at o = 1/2. For the Symmetric, Balanced case we show in Figure 10 the area 
of the g — / plane where waiting at one step is better than rushing ahead when C = 1. Fbrtunately this 
area includes all the area where we would not have been able to get speedup with two processors. Finally, in 
Figure 12 we show the achievable speedup when C — \. The shaded region is where a processor waits when 
it gets one step ahead of the other. In the unshaded region, if C is less than the value plotted in the figure 
we are able to gain at least some speedup over the equivalent single processor not running Time Warp. 

Since it sometimes pays to stop a processor when it gets one step ahead, we might surmise that there 
are ranges of the parameters where stopping a processor when it gets k {k > 1) steps ahead improves 
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Figure 10: Region of q — f space where speedup is possible. 


performance. For our model, this turns out not to be the case. By examining the Markov chain for k = 2 we 
find that the spieedup is never greater than the speedup gained by the standard algorithm. Therefore, it is 
never practical to stop a processor once it gets more than one step ahead. The Markov chain in Figure 11 is 
unique in the respect that at no point in time will a processor incur a cost for state saving or rollback. Once 
we allow the processors to get more than one step out of synchronization, we must save state since rollbacks 
are possible. Intuitively, the fact that we might only st<^ at one step ahead makes sense since a process at 
virtual time v can only send a message to the other at time v + 1. By getting two or more steps ahead, a 
rollback is already possible and we will incur a cost for rollback if a message is sent regardless of whether 
we wait further down the line. Waiting now only causes the system to have a smaller speedup. In a more 
general system where a processor may send a message at an arbitrary point in the future we may find that 
there are regions of the parameter space where it pays to stop a processor when it gets further than one step 
ahead. We are currently extending the rollback cost model so that the processors are able to make arbitrary 
sized jumps when advancing (not restricted to single-steps). This model will give us a better opportunity to 
examine the improvements we might gain by stopping or slowing down the lead processor when it gets mwe 
than one step ahead. 





Xi X2 

Figure 11: State diagram when each processor stops when one step ahead. 

3.4 Summary 

Our second model incorporated costs for rollback and state saving. In addition to calculating the complete 
solution to the Markov chain and the speedup over a single processor, we were able to find regions of the 
parameter space where it was better to stop either processor when it was exactly one step ahead. We could 
also show that stopping the lead processor when it was two or more steps ahead led to no performance gain. 
As with our previous model, since we have the exext solution to the Markov chain, we are able to calculate 
nearly any performance measure of interest. 


4 Conclusions and Future Work 

In this paper we presented two new models to extend our understanding of the Time Warp distributed 
simulation protocol when it runs on two processors. Our first model allowed messages to be queued which had 
not been previously addressed in any of the work on two-processor models. Our second model incorporated 
costs for both rollback and state saving. In this second model we were able to find regions of the parameter 
space where it was better to stop a processor when it got ahead of the other one rather than let it rush ahead 
and potentially incur a cost for state saving and rollback. Both models have given us a clearer and mate 
thorough understanding of the operation of systems synchronized by rollback when run on two processors. 

In addition to extending the rollback cost model to accommodate arbitrary sized state advances, our 
future work should be in the area of extensions to multiple processors. Extending our Markov chain approadi 
has proven to be imwieldy, and we are curren' ly pursuing approximations for multiple processors based on 
our current work. 
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A Solution to the Cubic Equation 

This material is taken directly from the CRC Handbook of Mathematical Sciences (Bey87]. 
A cubic equation, y® + + qyV + = 0 may be reduced to the form, 


+ OjX + 6x = 0 


by substituting for y the value x - Py/3. Here 

1 
3 


“i = \(3qy - P»*) and 6, = - ®P»9v + 27ry) 
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The form + ox + 6 = 0 with ab ^ 0 can always be solved by transforming it to the trigonometric 
identity 

4co8^(^) - 3cos(0) - cos(3tf) = 0. 

Let X = mcos(0), then 

x^ + ox + 6 = 0 

= m* cos*(fl) + am cos(tf) + b 
= 4 co6^(0) - 3 cos(0) - cos(3d) 

= 0. 



These values can then be substituted into the solutions given above to find n, rj, and ra. The values for Sj 
are symmetric in (a, a) and (gi, qi) to the r* values. 
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